Skip to content

[feat] completion api supports passing input token ids in either prompt or prompt_token_ids #3311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

liyonghua0910
Copy link
Collaborator

需求描述

Completion 接口需要支持在 prompt 字段直传 token ids 作为模型输入,与 vLLM 对齐。同时,在 FD v2.0.4 中新增的 prompt_token_ids 仍然有效,优先级暂定为 prompt_token_ids > prompt。另外,原来版本 prompt_token_ids 仅支持单条请求推理,现新增对批量推理的支持。

单条推理:

curl -X POST "http://0.0.0.0:8185/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": [123, 456, 789],
        "max_tokens": 10
    }'

curl -X POST "http://0.0.0.0:8185/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": "",
        "prompt_token_ids": [123, 456, 789],
        "max_tokens": 10
    }'

批量推理:

curl -X POST "http://0.0.0.0:8185/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": [[123, 456, 789], [987, 654, 321]],
        "max_tokens": 10
    }'

curl -X POST "http://0.0.0.0:8185/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": "",
        "prompt_token_ids": [[123, 456, 789], [987, 654, 321]],
        "max_tokens": 10
    }'

主要改动

  • fastdeploy/entrypoints/openai/serving_completion.py:新增一段对 prompt_token_ids 批量推理的处理,且优先级高于 prompt 字段
  • fastdeploy/input/ernie_processor.py:重构了对于传 prompt 字段的处理逻辑,如果该条 prompt 为 list 则直接写入 request.prompt_token_ids,否则为 str 时经过 tokenization 后再写入 request.prompt_token_ids。同时修改了 process_request 和 process_request_dict 方法
  • fastdeploy/input/text_processor.py:同上
  • test/ci_use/EB_Lite/test_EB_Lite_serving.py:新增对于 prompt 字段直传 token ids 以及 prompt/prompt_token_ids 字段批量推理的测试用例
  • test/ci_use/Qwen2-7B-Instruct_serving/test_Qwen2-7B-Instruct_serving.py:同上

Copy link

paddle-bot bot commented Aug 11, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants